Prototype Matching - Finding Meaning in the Books of the Bible

نویسندگان

  • Ari Visa
  • Jarmo Toivonen
  • Hannu Vanharanta
چکیده

It is common that text documents are characterised and classified by keywords and that the authors use to give and name these text characteristics. Visa et al. have, however, developed a new methodology based on prototype matching. The prototype is an interesting document or a part of an extracted, interesting text. This prototype is matched with the existing document database or the monitored document flow. Our claim is that the new methodology is capable of extracting meaning automatically from the contents of the document. To verify this hypothesis a test was designed with the Bible. Two different translations, one in English and another in Finnish, were selected as test text material. Verification tests that included the search of the ten nearest books to every book of the Bible were performed with a designed prototype version of the software application. The interesting test results are reported in this paper. The new methodology is based on a hierarchy of SelfOrganizing Maps (SOM) and on a smart encoding of words. The words of a text document are encoded. The encoded words are represented as word vectors. The word vectors are clustered by the SOM and this process creates a word map. The words of a text document are replaced with the addresses on the word map. Now the document consists of a sequence of addresses. These addresses contain information of word order. The document is considered sentence by sentence. These sentence vectors are clustered by SOM. This process creates a sentence map. Now the sentences of the text document are replaced with addresses on the sen0-7695-0981-9/01 $ tence map. After that the document consists of a sequence of addresses. These addresses contain information of different types of sentences. The document is then considered paragraph by paragraph. The paragraphs are considered as context vectors and clustered by SOM. The created map is called a context map. The paragraphs are replaced with the addresses on the context map. The document consists finally of a sequence of addresses on the context map. The more detailed description of the methodology can be found in several proceedings [10, 7, 11, 9, 8]. The test hypothesis was that the words, the word order in the sentences and the order of sentences in paragraphs can grasp higher level of information than ordinary word based searches. Two tests were designed. It was important to find a text that is well translated at least into two languages. The Bible was selected. Each book of 66 books in the Bible was selected as a prototype both in English and in Finnish versions. A window of ten closest books was considered. The window size ten was selected to guarantee a statistical significance. In the first test the number of books in the window that matched with other books in the Old Testament, respectively in the New Testament, was counted for each book. In the second test the same books within the window in English and in Finnish versions were considered. The results from these tests are statistically significant. The methodology is capable of understanding the contents of the document at least on a certain level. 10.00 (c) 2001 IEEE 1 Proceedings of the 34th Hawaii International Conference on System Sciences 2001

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

You Say Periklute, I Say Paraclete: Towards a Reconciliation Between the Bible and the Quran

The Quranic statement that Jesus predicted Muhammad by name is examined in light of the expectation of what the “kingdom of God” was. The concept of the kingdom of God as being the light or fire of the Holy Spirit at Pentecost is contrasted with the Sufi concept of the “Light of Muhammad.” Pentecost could be the Light of Muhammad coming upon the apostles of Christ; the Light is the same, but kn...

متن کامل

Patterns of the meaning of good practice, with an emphasis on High school Social studies books

The purpose of this article is to identify and describe the meaning of good in social studies textbooks of the first secondary school. The methodology of this article is based on the combined quantitative and qualitative approach. According to the obtained findings, it is determined that: 1- Good as public and social participation 2- Good as a transsexual matter 3- Good as a transhuman practice...

متن کامل

On the computational complexity of finding a minimal basis for the guess and determine attack

Guess-and-determine attack is one of the general attacks on stream ciphers. It is a common cryptanalysis tool for evaluating security of stream ciphers. The effectiveness of this attack is based on the number of unknown bits which will be guessed by the attacker to break the cryptosystem. In this work, we present a relation between the minimum numbers of the guessed bits and uniquely restricted...

متن کامل

The meaning of place, A constant or changing quality? Lynch,Rapoport and Semiotics view points

The matter of meaning in place, is one of the main qualities of human life. People consciously or unconsciously looking for meanings in places. The importance of finding the meaning of place is that, Understanding the meaning will lead to “act” in place. Finding the place friendly, or finding it insecure will lead to act differently. Now the question is that, is the meaning of place, something ...

متن کامل

Job Finding and Inflow to Unemployment : The Case of Iran

In order to analyze the labor market through search and matching theory, we need deep parameters namely, rate of inflow to the unemployment pool and job-finding rate. In other words, these rates are primary parameters of matching function; hence, estimating these parameters is an essential step for the use of search and matching theory in every economy. In this paper, we estimate these rates of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000